Vocabulary and Environment Adaptation in Vocabulary-Independent Speech Recognition
نویسندگان
چکیده
In this paper, we are looking into the adaptation issues of vocabulary-independent (VI) systems. Just as with speakeradaptation in speaker-independent system, two vocabulary adaptation algorithms [5] are implemented in order to tailor the VI subword models to the target vocabulary. The first algorithm is to generate vocabulary-adapted clustering decision trees by focusing on relevant allophones during tree generation and reduces the VI error rate by 9%. The second algorithm, vocabulary-bias training, is to give the relevant allophones more prominence by assign more weight to them during Baum-Welch training of the generalized allophonic models and reduces the VI error rate by 15%. Finally, in order to overcome the degradation caused by the different acoustic environments used for VI training and testing, CDCN and ISDCN originally designed for microphone adaptation are incorporated into our VI system and both reduce the degradation of VI cross-environment recognition by 50%.
منابع مشابه
MLLR method for Environmental Adaptation in a Continuous Farsi Speech Recognition
In this paper, MLLR adaptation of continuous density HMM is investigated in a Farsi speaker independent large vocabulary continuous speech recognition system in attempt to improve recognition rate in real world situations. In the MLLR framework, we have experienced the use of Gaussian mean transformations in global adaptation and regression tree based adaptation. Besides full and block-diagonal...
متن کاملVery fast adaptation for large vocabulary continuous speech recognition using eigenvoices
The principle of the eigenvoice method | using a priori knowledge on the speaker variability as collected during the training for a very fast adaptation | is applied to continuous speech recognition with large vocabulary. The handling of mixture density HMMmodels is discussed. For the case of gender independent models, a decrease of the word error rate of up to 15% is observed for unsupervised ...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملSpeaker Adaptation Using Projection to Latent Structure Algorithm
Correlation between observations of different states is an important apriori information reflecting speech characteristics, which is a key factor improving speech recognition system robustness. Since speech and noise are statistically independent, correlation information can be used to reduce noise effect on speech recognition performance in noisy environment. This paper proposed a new speaker ...
متن کاملTowards non-stationary model-based noise adaptation for large vocabulary speech recognition
Recognition rates of speech recognition systems are known to degrade substantially when there is a mismatch between training and deployment environments. One approach to tackling this problem is to transform the acoustic models based on the channel distortion and noise characteristics of the new environment. Currently, most model adaptation strategies assume that the noise characteristics are s...
متن کامل